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ABSTRACT 

This document discusses the role and the scope of 
evaluation in the mathematics classroom, The scope of mathematics 
objectives to be evaluated, the scope of evaluation purposes in the 
mathematics classroom, and the scope of evaluation procedures are 
noted* Specific comments are made on various evaluation procedures, 
including: (1) observations; (2) interviews; (3) inventories and 
checklists; (4) attitude scales; (5) criterion-referenced tests; (6) 
norm-referenced tests; (7) standardised tests; and (8) diagnostic 
tests. Both general and specific suggestions for planning tests and 
for writing various test items are included, Types of test items 
discussed include multiple choice, true-false, matching, completion 
and essay. An extensive list of selected references is included to 
direct attention to documents which will provide additional help, 
(TW) * 
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Foreword 



To some extent, this booklet is being revised at an inopportune 
time. Evaluation is the focus of attention of groups at both state 
and national levels, and much clarification and development is 
underway* Little of the results of current activities can be 
reflected at this point, but perhaps a revision will be needed sooner 
than the 12 years that have elapsed since this booklet was first 
published- 

The impact of international, national, and state assessments of 
achievement on currleular goals is one cause for the focusing of 
attention* Data from the fourth National Assessment of Educational 
Progress in mathematics will appear in tna near future, and it will 
surely be reviewed as carefully as the previous assessments have been* 
Information both on status — how well are students achieving 
currently — and on change — what, if any, progress has been made 
since previous assessments — is of vital interest* Data from the 
Second International Study of Mathematics recently attained headlines, 
with the ranking of the United States well below almost every other 
country on most of the achievement scales* Results from state 
assessments in mathematics, collated by Suydam (1984), indicated some 
areas of strength and many areas of weakness* The public, as well as 
educators, desires improvement* 

Mathematics educators also have reached consensus on the need for 
change in both curriculum and instruction* The National Council of 
Teachers of Mathematics began this decade by publishing An Agenda for 
Action: Recommendations for School Mathematics of the 1980s * One of 
the eight recommendations focused on evaluation: ~~ 

The emcees of rmthmnatics programs and students learning Trust 
he evaluated by a wider range of Tma&ures than conventional 
testing m 

Noting that "many people use test scores as the sole index of the 
quality of mathematics programs or of the success of student 
achievement," the Council made a concerted plea for evaluation 
measures which assesss the full range of the goals of mathematics 
program, including not only skills but also problem solving and 
problem-solving processes* More recently, a Task Force was appointed 
to study the role that testing and evaluation should play in 
mathematics programs, and ways of putting into practice evaluation 
strategies consistent with the goals and objectives of mathematics 
education* 

Concurrently* the Mathematical Sciences Education Board has 
identified evaluation as one of its major strands of interest* The 
Board noted that! 
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Methom of we -'ion - especially standardized, paper-and- 
pencil ■ hoi teste of 'basic skills' — arm being 

used *oes tfu *r. : „tT.j without sufficient reflection and are 
ihenm* m obet les - o the teaching of new methods and 
htgh^ - der o -r<m skills, as well as to the use of 
calouu,^,^ <?« ■ computers. The nation is in the grip of a 

testing nyeti ue' vlch has led to the widespread use 
of sue*? teste %n pnLte of repeated Warnings that several 
pre**-™, ur.-n their use is based are open to serious 

Wo ' r - r -' ^perjiCively with the NCTM, tha MSEB Is developing 
recommenced : ndards or criteria for excellence in school 
mathematics > part of this effort will involve the development of 

guidelines ;or redesigning tests and other assessment mechanisms so 
they are properly aligned with the curriculum and provide meaningful 
evaluation of student achievement." Questions about the validity of 
existing tests, including the degree to which they match what is being 
taught, the continued use of tests that inhibit or prohibit currieular 
change, and the misuses of assessment information have all been 
raised. 

The impact of technology is clearly a part of the need for 
reform, Computers can deliver adaptive tests that can reduce the 
length of tests while preserving precision, and at the same time 
standardize administration and, of even more import, make results 
immediately available to the teacher. Tests which admit the use of 
calculators must clearly be developed; ten years ago, when they first 
became cost-feasible for classroom use, it was Inconceivable chat 
their existence could be ignored. Moreover, tests that fail to take 
into account such vital currieular strands as probability and 
statistics or problem-solving processes have survived past their time. 

This may be an inopportune time for this booklet to appear; in 
another way, there is no inopportune time for such a booklet. Its 
purpose is to help classroom teachers extend their awareness of ways 
to evaluate and their skills in developing appropriate evaluative 
measures. It may help them prepare for the future. 

It is intended as a quick reference guide rather than as an 
encyclopedia on evaluation of mathematics instruction. Its aim is to 
help teachers to review, to supplement, to develop questions about a 
process they use every day. The list of references should help them ' 
delve further into answers for their questions. 

Two emphases are foremost: 

(1) Evaluation means much more than paper-and-peneil tests. 

(2) Each evaluation measure should be as good as possible. 
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Research in classrooms has indicated that teachers use many .evaluation 
procedures. So, with awareness that change must continue, let's turn 
to the classroom and the ways teachers evaluate . , , 
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Evaluation 

The following statement is on official NGTM p&osition. 
It was developed by the Profesiional Developn^ent and 
Status Advisory Committee and adopted by the Boarcmd of Directors, 



INCREASING demands for aceounta 
have led state and provincial legislatures 
and agencies, school districts, professional 
organizations, and teacher education in- 
stitutions to expand their efforts to evaluate 
teachers' knowledge and performance, Ah 
though such evaluations are often used as a 
basis for decisions about admission to 
teacher education programs, eligibility for 
certificates, or advancement in the profes- 
sion, the most important purpose of evalua- 
tion is the personal and professional growth 
of the individual teacher and the improve- 
ment of teaching, Consequently, evaluation 
should ba a cooperative process between 
the teacher and the evaluators. 

Evaluation includes the identification of 
goals by the teacher and the evaluators, the 
collection of information, and a collabora- 
tive dialogue between the teacher and the 
evaluators to leformulate, redirect, and re- 
fine goals for the future, Goals for personal 
and professional growth may include some 
that are mandated by the state or province, 
district, teacher education institution , or in- 
dividual school, but the teacher must be an 
active participant in identifying goals of a 
more specific nature. 

Evaluation should not be limited to a 
single instrument— such as to paper-and- 
pencil testing of the students' or the 



teacher's knowledge . alone, to checklists of 
isolated behaviors, Oit to a single observa- 
tion session. Data slicould be gathered from 
various sources, inelia^ciing, but not necessar- 
ily limited to, the tgiMcher, peers, students, 
supervisors, and administrators. 

The use that is ma^de of the information 
gained through the evwvaluation process is as 
important as the act evaluation itself. The 
appropriate outcome of this ongoing pro- 
cesi is a collaborative dialogue between the 
teacher and others in^volved in the process, 
resulting in a nmtuaUXy a^eed-on plan for 
professional growth, 

Although the proe«cess of evaluating the 
effectiveness of teach^srs may be applicable 
across subjects and gr^de levels, the specific 
pis, criteria for obsec^rvation, and resulting 
dialogue must be directly related to the con- 
tent of mathematics and to the teaching 
strategies. The evaluaotion team should rep- 
resent expertise and ^sxperience in mathe- 
matics and the teaching of mathematics as 
well as in evaluation, 

Therefore, the NGTTM recommends that 
supervisors and administrators work closely 
with teachers to asiurae that the evaluation 
process is used to enh«=inee the professional 
development of the tea^cher and increase the 
ifffctiveness of matfceEmatics teaching, 

(March 1987) 
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Evaluation to the Mathewtics CZ3.assroom^ 
From What : #nd CJhy to How and Where 

Introduction 



Imagine a okssroom, Pmfha^B it's your 



a s,aeeroom. 



Imagine 8 S or SO etudmniB in thai olaBsroc^m. Perhaps they've 
your* etudenie, - 

Imaginm thtiudente eitti-n^ at Me. 

Imaginm you me the stuAgnftm clear everything off the tope of the 
desks, except for a pencil. 

What dU tk teacher Bay af -the pint the -asterisk appeared? 

Imaginm the sound of the i^a^ohmr's voice. Insert the words the 
teacher saym infiaae of the aafe~^i B k, me wof^da ore: "Clear your 
desks. Take out a pencil. You a=y*m nm going t-a have a test," 

When we think of evaluation -in the mathemattlcs classroom, testa 
come immediately to mind . , . tests whire students sit at desks and 
write or circle or draw lines. 

But is that ill there Is? 

Imagine that mm claBeroom ^hree kys ago-. Groups of students 
are scattered around the rooms an spinning a three-colored cube, 

and making a reeotf of what color lands upward mmaeh time. Several are 
making a graph on a bulletin i«*fc=2„ Otkvs are stretching yarn 
against various oimis in iHm r*oc=>m, Som are seated with diagrams 
and worksheets, M games, with e=?thmr mterialem before them. 



Where ie th teacher? What Ss he or she dcaing? 

Ib any evaluation ocaurfing Sn thi alassrocsm at that moment? 

Imagine the tkesroom foW {kmym ago. The students sit at their 
desks. The Uadm stands mar itetf chalkboard. She writes some 
numerals on iHe lom>d. She aa%B a- queeHon, Semveral students in turn 
respond. She aek another qum$Uam?i, Om studermt comes to the board 
and draws a diagrm, The teacher ^uerm the gr—aup by raising her 
eyebrows. Three students shahs ik^^ir hods "no ,wm ^ four nod "yes" the 
others look pmald, The teaQher <amks another q-uestion, 

Ie any evaluation occurring i~^n the olasaroozm during this lesson? 
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_ Imagine the elaeeroom five days ago. The students have moved 
their desks so they have tables grouped by fours. Each group follows 
the directions of a leader as they manipulate materials on the desks. 
They help each other; they talk about what they find happening. Then 
each records a response on a worksheet, 

Is any evaluation occurring as they work on this lesson? 

The answer to each question is obvious. If the teacher is 
teaching, the teacher is evaluating almost every minute en each of the 
days imagined — and on any other day you want to imagine, Sometimes 
the evaluation leads to an immediate reaction! you smile approval or 
you frown- you say "good answer!", you say "that's on the right 
track"; you word a question so the student might see an error in the 
last response, you skip several questions because students are ready 
to move more quickly; you introduce a subtraction sentence instead of 
working only with objects, you get rods as an alternative way of 
clarifying a mathematical idea, Sometimes the evaluation leads to 
notes on students? anecdotal records, a comment on a problem to 
pursue further, a change of lesson plans for next week. 

Evaluation in the mathematics classroom consists of much more 
than a testing program involving paper-and-pencil tests on 
mathematical content. Measurement of the content goals of mathematics 
is comparatively easy: you can readily obtain an objective measure of 
certain computational skills and specific mathematical processes that 
form a portion of the mathematics curriculum, Measuring other goals 
of the mathematics curriculum such as problem solving — is more 
difficult. Evaluation includes a wide variety of means of collecting 
evidence on students 1 behaviors — rating scales, questionnaires, 
checklists, reports from parents, student activities, and samples of 
students work all provide useful evidence of behavior and progress, 
Observing, listening, presenting a task, interviewing! each makes a 
valid and viable contribution to the evaluation process. 

But sometimes you evaluate with paper-and-pencil tests, 
Paper-and-pencil Instruments have their place; they supplement other 
forms of evaluation. The very process of preparing for and taking a 
test helps students to synthesize what they have learned. The 
responses to specific items help the teacher to diagnose a weakness or 
confirm what was seen in the day-by-day process of observing student 
reactions and behaviors. Both students and teachers take stock! this 
mathematical idea or fact or skill or concept has been mastered and 
can be used in developing newer content. Another mathematical idea or 
fact or skill or concept needs to be given more thought or practice or 
development. * 

One of the purposes of this booklet is to help you to develop 
better paper-and-pencil measures. Tests continue to be a part of the 
educational environment, if only because they provide a feasible way 
of finding out, in a relatively short amount of time, what or how well 
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each child is learning certain content. Tests yield concrete and 
detailed evidence economically and in convenient form. Tests are 

M^ITii 0111 ? t0 ° 1S Wh ° Se ValuB UeS not m » rel ? in their use but in 
the skill and understanding of the teacher. Good tests do not just 
happen: they require much thought and careful planning and thorough 
analysis* m ~ 

Another purpose of this booklet is to review other possible 
approaches to evaluation. What they are and how they can be useful 
are each considered. Finally, some pertinent literature on evaluation 
in mathematics is noted. Some references are inserted directly into 
the text; most are included in the list of references at the end of 
the booklet without being cited. 



II. The scope of evaluation 

Evaluation is a continuing, integral aspect of mathematics 
teaching and is essential for improving instruction. Evaluation 
ascertains whether the teacher is teaching what he or she thinks is 
toeing taught and the learner is learning what the teacher thinks the 
learner is learning. Thus, there must be a match between what is 
being taught and what is evaluated. Evaluation is qualitative as well 
as quantitative. It Involves appraisal as well as measurement, for it 
includes the stage of making value judgments. This stage occurs when 
the means of evaluation is chosen, when it is applied, and when the 
results are judged. 

Evaluation takes a variety of forms, since there is no one 
technique that is equally appropriate for measuring all aspects of 
learning. Both cognitive factors and affective factors must be 

«nr^ ed \/L e feellng and the dolB « aspects as well as the knowing 
and the thinking aspects are Important in every mathematics program. 



A. 



The scop e of mathematics objectives to be evaluated 



Scope-and-sequence charts in textbooks and curriculum guides 
provide one way of determining the dimensions of the mathematics 
program. Some mathematics educators have described the scope in 
various waysj for example i 

In the study of mathematioa a student must learn fasts, 
develop oonoepia, use symbols, and mater processes and 
procedures. Bui he Cor she] should also learn to develop 
generalimations and to sense the presence of mathematical 
%deae and structures not only in abstract situations but 
also in many areas of human activity. Be Cor she J should 
develop h%s [or her J reasoning powers in order to prove or 
disprove a statement by deduction or to predict an event 
with appropriate probability, It is the function of 
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evaluation to determine how well a student has mastered 
these varied aepeats of mathematiaem 

[Sueltz, 1961] 

Other writers have developed models to aid in the process of 
designing instructional materials and tests. The taxonomy developed 
by a committee working with Bloom has long provided a basic model for 
the analysis of educational goals in general (Bloom, 1956f Bloom et 
al,, 1971; Krathwohl et ai, , 1964), Bloom f s Taxonomy is presented in 
terms of two domains, the cognitive and the affective* The cognitive 
domain, not surprisingly, has been of most concern to those evaluating 
mathematics instruction, even though the importance of the affective 
domain is recognized, Goals in the cognitive domain have been 
organized into six main categories i 

1- Knowledge — recognizing or recalling specific material 

2, Comprehension — grasping the meaning of material 

3* Application — using information in concrete situations 

4, Analysis — breaking down material into its parts 

5* Synthesis ™ putting together parts to form a whole 

6* Evaluation — judging the value of material and methods 

for given purposes 

Goals in the affective domain are organized into five categories? 
receiving, responding, valuing, organizing, and characterising by a 
value, These categories have not been used at all as frequently- as 
those in the cognitive domain. 

Other models have been developed that ar€ more specific to the 
goals of mathematics education, Generally, such models have combined 
some categories of Bloom f s Taxonomy* Or they have used labels more 
specifically identified with mathematics* Thus, the School 
Mathematics Study Group (SMSG) used four categories to assess the 
cognitive domains computation, comprehension, application, and 
analysis. 

More recently, the National Assessment of Educational Progress 
(NAEP) has modified the framework used for planning the evaluation of 
mathematics objectives, For the fourth national assessment, the 
mathematics objectives are organised into five broad areas (NAEP, 
1985)s 

1* Problem solving/Reasoning 
2* Routine application 
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3. Understanding/Comprehension 

4. Skill 

5. Knowledge 

Higher-order thinking skills, familiar applications, interpretations 
of underlying concepts and relationships, routine manipulations with 
standard procedures, and recall and recognition of mathematical 
content are thus assessed by the five categories. 

These models have each aided curriculum developers and test 
constructors. Yet many teachers find it difficult to recall the 
categories, and even more difficult to apply them. Pikaart and 
Travers (1973) simplified the modal so that it would really help 
teachers to describe specific learning goals, yet be comprehensive, 
flexible, and functional. They described three dimensions — goals or 
products, content, and teacher behavior or processes, including 
planning, teaching, and evaluation. They noted that in practice it is 
difficult to distinguish activities that are planned for either 
cognitive goals or affective goals, since these are interrelated and 
interwoven in instruction. Therefore, the same modal may be 
considered for both domains i 

1 * Knowledge 

a. Statements 

b. Basic skills 

2. Understanding 

a. Concepts 

b. Principles 

3. Problem solving 

a. Formulating hypotheses and testing them 

b. Proving theorems 

c. Solving non-routine problems 

Categories or levels are important to consider in setting goals 
and developing objectives for instruction, in planning instructional 
activities and procedures, and in evaluating instructional outcomes. 
Too frequently, mathematics evaluation encompasses only the lowest 
level — knowledge. It is easy to construct an objective test at the 
knowledge level- it is much more difficult to construct tests and 
other evaluation procedures that assess higher cognitive levels. 
Perhaps the greatest contribution of a model detailing the various 
categories is that it makes everyone aware of the need to evaluate 
higher-level outcomes. 
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B. The scope of evaluation purposes 

Every teacher evaluates for at least three purposes i 

2. To aeseee the wathematios program in the classroom and 
in the school* 

The success of your mathematics program is not determined by how 
well it compares with the program in other schools* The important 
concern is the impact that it has on helping your particular students 
to learn mathematics. Is the content appropriate for your students? 
How well are they progressing toward the mathematical goals you have 
set? Are they able to apply their knowledge and skill in new 
directions? Does the program make the students want to continue to 
learn more mathematics? Do they enjoy doing and using mathematics? 
Is the mathematical content important and worthwhile? Is the program 
teachable and learnable? 

Comparisons with other students in other schools can help you to 
attain some perspective on how well your students are doing* however* 
The National Assessment of Educational Progress (Carpenter et al* , 
1978, 1981; NAEP, 1983) and various state assessment programs (Suydam, 
1984) attempt to provide such perspectives* But you are not teaching 
"other students in other schools". Your goal must be to help all of 
the students in your classroom to learn and to enjoy mathematics as 
well as each is able* 

A guide to assessing the mathematics program in the school which 
reflects more than accountability test scores has been prepared by the 
National Council of Teachers of Mathematics (NCTM, 1981). It notes 
that, to provide a comprehensive mathematics program, the total school 
staff must be committed to* 

1* meeting the needs, abilities, interest, and capabilities 
of each student. 

2* exhibiting positive attitudes toward mathematics; 

3* developing positive student attitudes toward mathematics; 

4* preparing students to use mathematics successfully in 
their future vocations, avocations, and leisure time- 

Twenty^one standards are then presented, concerned with Instruction, 
the curriculum and instructional materials, the teacher of 
mathematics, and physical facilities and equipment* Appropriate 
questions to assess the attainment of each standard can be of help in 
evaluating the mathematics program* 
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2. To 088088 the achievement of the students in each olcCeeroo. 



<m. 



The vital factor to note in assessing achievement ia that you 
must evaluate students in terms of both progress and status, Testing 
supplements other evaluation procedures an a means of ascertaining how 
well students have succeeded in mastering vital content and acquiring 
important skills, 6 

3, To diagnose individual strengths and weaknesses. 

You can use test results to place students in instructional 
materials, to group students for instruction, and to assign grades. 
You can also use test results to help you to learn more about how to 
teach more effectively. 

Far too many mathematics tests consist simply of examples for 
which students are to provide answers. Far too often these tests are 
corrected by a check for correct and incorrect answers. The teacher 
who merely obtains the total score made by a student on a test is 
overlooking the greatest value of the test for instructional purposes, 
Aias — so much is thrown in the wastebasket! Analysis of how the 
student reached the correct or incorrect answer is much moreTmportant 
than merely whether the answer was right or wrong. Analysis of 
performance on individual questions can tell you more than a total 
score can. 

Evaluation procedures other than tests are invaluable in 
providing diagnosite information. As you listen and observe, you 
build the basis for interpreting test scores and deciding how to 
structure your teaching. 

e * The scope of evaluation procedures 

This section contains comments on various types of evaluation 
techniques: first, non-paper-and-pencil procedures; then, 
paper-and-pencil instruments. 

1, Observations 

Many mathematics lessons have a component in which students work 
in small groups or individually on tasks, assignments, or worksheets. 
This is a time when evaluating students 1 mathematical behavior is of 
singular importance. You can move about the room, observing students 
as they work, listening as they talk among themselves, making notes, 
questioning, making suggestions. You also observe during discussion 
periods, but your involvement in the discussion sometimes keeps you 
from attaining perspectives then you need to use your evaluation 
immediately as you continue the discussion. You have little chance to 
make notes. Your primary purpose is to guide. When you are free to 
observe as children work independently, you can evaluate even more 
effectively, with a defined perspective, and you can limit your 
observation to specific aspects of student behavior. 
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Note the method of attacking problems used by a student , and how 
he or she proceeds to work on a problem. Note the expression on Sue's 
face, her mannerisms, her concentration. Note how consistently she 
works, where she meets difficulty, when she becomes careless* Observe 
the emotional climate of the room. Observe the student's level of 
independence. Does Mark really need your help when he raises his 
hand, or does he need encouragement or praise? How dependent is he on 
help from you, from textbooks, from other students? Does he try 
various ways of solving -a problem, or does he try to apply the last 
procedure used in class? 

Make a simple memo that describes the situation and the behavior 
you've observed ™ an anecdotal record. Use a small notepad or cards * 



Name Date Situation Behavior Comment 

Sue* greyp \%&$ar\ § ^c\< +o help 

meaning 

fro sh* oris wilh 

^raph pap&r 

\\%0 ©QfFipu^4i*0n missed m0£+ redevelop 

$&mcj CombincA-ian^ in and prachc*** 

mwil^iply by 7 er 9 vytf+i 7 artd ^ 



File the anecdotal records in the student's folder, in which you also 
place examples of daily work, project reports, and other papers* 

Sometimes audiotape or videotape can be used to provide a record 
that you can go back over and analyse in more detail than when you are 
involved with the group. Photographs can provide a record of project 
work and ''products". You can compare progress with more objectivity 
than simply through memory of what was done. 

2 . Interviews 

An interview is an attempt to remove the restriction of writing, 
both that involved in your development of n test item and that of the 
student in developing an answer. You can delve more precisely into 
how a student solves an example or problem* You can learn how he or 
she goes about finding answers. You can follow as he or she describes 
what he or she is doing, and why. 

Basically, the interview procedure is simple (Weaver, 1955): 
(1) Face the student with a problem. 
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(2) Let him or her find a solution, as he or she tells you 
what he or she is doing, 

(3) Challenge him or her to elicit his or her highest level 
of understanding. 

Pr-eeeni Pat with an emample written on a oavd: 



48 JM? 

Have Mm explain the pvoeeduve he follows while computing 
answer* r ■» 

Make notes as he works; sometimes it's helpful to have an exact 
record of what he says. Challenge him with such questions as, "Are 
you sura that's correct?" "What if I sa l d the answer was ?" "Is 
there any other way you could find the answer?" And remembeT that the 
.wo most important questions In an interview arc "How?" and "Why?". 

Other suggestions for Interviewing include; 

(i) Establish rapport and maintain a relaxed atmosphere. The 
student needs to understand what he or she is to do. You don't want 
Karen to aearc >, for the answer she thinks you want — you want her 
answers, not yours. And you want to know what she's thinking. You 
want her to respond naturally, freely, and fully. 



(2) 



Select your examples and questions for your purpose. At 



, - r—~ - uxoot^wuB iui yuuL purpose, AC 

times, you'll interview only some students; at other times, the whole 
cj,ass. Use more than one example of a particular type, to determine 
how consistently a student works, 

-™i (3> D ?f * '**?'' d ° n,t give answe « s and avoid leading questions 

S T^^° m '. D ° 88 1±ttle talkln « as «" *>u want to find 
out what the student is thinking, 

(4) Record the student's answers and thinking and whatever he or 
she does, as you go. You may want to write fast, or tape record, or 

"J! 80 *!" ° r COde ' J US±ng an lnt «view form. Don't rely on memory to 
make a true record after the interview is over. Careful records 

di«no«Mi% y °\f a8Cettaln Pa«erns and provide other evidence for 
diagnostic teaching, 

(5) Time may be a problem, or it may be an excuse. If you are 
serious about using interviewing as a means of finding out more about 
what students have learned and are learning, the time can be found - 
when others have a worksheet or other seatwork, during free-reading 
time, etc. Schedule time one day a week or some time each day, 

(6) You may want to have a student use a tape recorder without 
you being present. Have Kim tell how he does some aspect of 
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mathematics, why he attacks a problem as ha does, why ha likes or 
dislikes mathematics. A group of students might discuss various w^ys 
of solving a problem. You can play the tape back later and analyze 
student thinking more carefully and from a different perspective than 
you can if you're Involved in the interview. 

Researchers have often used interviews to assess the extent to 
which students understand a procedure or c/«n apply a process, Thus, 
Lankford (1974) had seventh grsdars add, subtract, multiply, and 
divide with whole numbers and fractions. His compilation of students' 
responses can be of aid to you as a teacher, for the myriad errors 
that students make can help you plan instruction to avoid them or 
clarify meanings that are essential, Rays et al, (1982) used 
interviews to determine how good estimators work. Not only did this 
lead to the development of materials to teach estimation, but it also 
provides clues for you about how estimation skills are used. 

3. Inventories and dhedklists 

An Inventory is a check of what the student knows about a 
specific topic or about the total program. It's probably especially 
useful at the beginning of the year. In oral form, primary-level 
teachers find it an indispensible alternative to a written test. At 
upper levels, it may be written and administered just as any other 
test is. The Inventory frequently is used to survey the previous 
year's work or the status of students (both individuals and class) as 
they begin work in your classroom. Such a test is an aid in assessing 
the readiness of students for more advanced work, as well as a 
diagnostic aid. List the items and skills you want to inventory. 
Decide how you will inventory each; what directions will you give the 
students, what tasks and materials will you use, or what test items 
will you need. 

A checklist is a type of inventory i a list of kinds of behavior 
to look for — for example, evidence of interest in mathematics, 
applying mathematics, working with others, using a range of materials, 
etc. Rating scales are like checklists but provide for a degree of 
appraisal: 

turns in assignments; never — oaoaaionally — always 
counts on fingers; frequently — sometime — never 

4. Attitude scales 

Everyone believes that the affective component of learning is 
important! if students are interested in and enjoy mathematics, 
they'll learn it better. Attitudes involve both cognitive and 
non-cognitive aspects, an intellectual appreciation and emotional 
reactions. Thus, attitudes toward mathematics involve many facets, 
ranging from awareness of the structural beauty of mathematics and 
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of the important roles of mathematics, to feelings about the 
difficulty wd challenge of learning mathematics, to interest in 
particular types of mathematics or particular methods of being taught 
mathematics. 

Students » attitudes toward mathematics are assessed in several 
ways. One primary way is through observation; by observing 
expression, comments, and behaviors as a student reacts in a 
mathematical situation, you can infer how he or she feels about 
mathematics. You can note how often Jennifer chooses t mathematical 
activity when she has an option, how readily she attempts to apply 
mathematical ideas to real-life situations, how enthusiastically she 
reacts In a mathematics lesson, A checklist can be used as a 
systematic approach to recording observations. 

At times, you can ask the student to comment directly on his or 
her attitudes. You can have Kai write an essay on a question such as, 

Do you generally like or dislike mathematics? Why or why not?" Or 
she can be asked to complete sentences such as "1 like mathematics 

because You may ask her to rank in order of preference the 

subjects which she is studying; from this the level of her preference 
for mathematics can be inferred, by noting where she places it In 
relation to other subject areas. 

Perhaps the most widely used measure of attitudes is the attitude 
scale. Half a dozen scales have been extensively used; on many of 
them, items such as the ones on the following scale appear. 

Attitudes Toward Mathermtios 

(Scale Form B) 



Marilyn N, Suydam and Ceail R, Trueblood 
The Pennsylvania State University 

This is to find out how you feel about mathematics. You are to read 
each statement carefully and decide how ^ou feel about it. Then 
indicate your feeling on the answer sheet by marking; 

A - if you strongly agree 

B - if you agree 

C - if your feeling Is neutral 

D - if you disagree 

E - if you strongly disagree 

1, Mathematics often makes me feel angry. 

2. I usually feel happy when doing mathematics problems. 

3, I think my mind works well when doing mathematics problems. 

4. When I can't figure out a problem, I feel as though I am lost In 
a mass of words and numbers end can't find my way out. 
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9, 
10, 



5. I avoid mathematics because I an not very good with numbers, 
o. Mathematics is an interesting subject. 

7. My mind goes blank and I am unable to think clearly when working 
mathematics problems. 

8. I feel sure of myself when doing mathematics. 

^ SOm ? t J ffleS f ! el llke runnin I away from my mathematics problems. 
When I hear the word mathematics, 1 have a feeling of dislike. 
11« I am afraid of mathematics. 

12. Mathematics is fun. 

13. I like anything with numbers in it. 

14. Mathematics problems often scare me, 

15. I usually feel calm when doing mathematics problems, 
lb. I feel good toward mathematics, 

17, Mathematics tests always seem difficult. 

18. I think about mathematics problems outside of class and like to 
work them out. 

19, Trying to work mathematics problems makes me nervous. 

20. I have always liked mathematics. 

21. I would rather do anything else than do mathematics, 

22, Mathematics is easy for me. 

23, 1 dread mathematics, 

24, I feel especially capable when doing mathematics problems. 

25. Mathematics class makes me look for ways of using mathematics 
to solve problems, 

26. Time drags in a mathematics lesson. 

This scale attempts to ascertain less directly and therefore, it 
is hoped, with greater reliability or credibility, how strongly the 
student likes or dislikes mathematics. The major advantage of a scale 
such as this one Is that it is designed to be used in a relatively 
short amount of time - five to ten minutes. Its shortcoming is that 
it does not provide Information across the range of factors that 
comprise attitudes toward mathematics. One of the most widely used 
scales of this multi-dimensional type is the one developed by'Pennema 
and Sherman (1976). it assesses such facets as attitude toward 
success in mathematics ; stereotyping mathematics as a male domain- the 
perceived attitudes of mother, father, and teacher toward one as a 
learner of mathematics ; effectance motivation in mathematics i 
confidence in learning mathematics ; and the usefulness of mathematics, 
rhe 26-item scale above has been used at all grade levels from 
kindergarten up, while the multidimensional scale is more appropriate 
tor use with students who are In middle schools, secondary schools, or 
college, ' 

5* Cpiierion-vef 'e^emed imaia 

Paper-and-pencil instruments can help as you evaluate the 
individual student in terms of his or her own progress s what has Bob 
learned that he didn't know before you taught that unit on fractions 
or binomials? You compare the performance of a student with his or 
her previous performance. You design a test to ascertain whether or 
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not each student has learned what you have taught. You set a level 
that says, "If a student gets this percentage of the items correct, 
adequate mastery of the topic can be assumed," You can also ascertain 
now well your class has mastered a particular topic, so the test 
parallels the work in class. Such tests are criterion-referenced 
tests or mastery tests. Besides telling you how well a topic was 
learned, it also indicates the points at which you need to provide 
reteaching. r 

6. Nopm-pefepenaed tests 

Paper-and-pancil instruments can also provide you with 
Information on the status of the student in relation to other students 
In thn class, A student is compared with others, with his or her 
achievement evaluated relative to the achievement of the class or a 
group of classes. The test may also be designed in terms of 
ascertaining whether students have been learning what you think they 
should be learning from your teaching. But instead of setting a 
mastery level, a scale is used! you expect a few students to do very 
wall, a few to do poorly, but most to attain an "average" level. 
These tests are based on the content you have taught, as are 
criterion-referenced tests, but they are norm-referenced measures 
because the performance of the student is compared to that of the 
class. 

?* Stemdapdi&ed teste 

Another form of norm-referenced test is used in almost every 
classroom at least once a yean the commercially-published 
standardised test. While a few standardized tests are 
criterion-referenced, most are norm-referenced. Standardizing a test 
refers to developing prescribed, uniform requirements for 
administration and scoring and to the statistical analysis after the 
test is given to a large, specified group of students, resulting in 
the development of norms. These are expectancy levels: the scores 
that students in the norming poplulation attained. With the use of 
norms based on what students in many classrooms have scored, you have 
a measure of how well your students are learning when compared with 
many others. 

Standardized tests are not a substitute for teacher-made tests, 
but a complement. More careful preparation and research are provided 
in developing a standardized test than is ordinarily possible for any 
individual teacher to provide when developing his or her own classroom 
tests. The content has been determined on the basis of common 
elements of widely used courses of study and textbooks. But 
standardized tests assess only a portion of the content that might be 
covered at a particular grade level or in a particular course, It is 
imperative that care be taken to ascertain that the standardized test 
adequately covers the expected outcomes of your school's mathematics 
program. Producers of reputable standardized tests publish outlines 
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of test content to compare with your local program. Aspects that are 
unique to your program will naturally not be included, so you'll have 
to make provision for assessing them, 

Some guidelines have been suggested for selecting a standardized 

c es c * 

(1) Formulate clearly the purposes that will be achieved by use 
of the test- precisely what kinds of information are the tests 
expected to supply? What outcomes are to be measured? What use is to 
be made of test results? 

(2) What tests are available that will meet your needs? Lists of 
iQfti? ar ° avallable and 8h ould be consulted (Mitchell, 1985; NCTM, 

(3) Obtain copies of those tests which, from their descriptions 
appear to meet your purposes. Most test publishers will furnish ' 
sample test materials. 

(4) Examine the tests and the test manuals for appropriateness 
for your particular needs, reliability, ease of administration and 
scoring, kinds of normative data provided, and evidences of careful 
development. Norms should have been established in schools similar to 
yours. There should be at least several thousand students in the norm 
group if the norms are to be accepted with confidence. The norm 
should be stated In a convenient form, such as percentiles (which 
indicate the percentage of students whose performance is found to be 
below any score) or grade norms (which show how well the average 
student in a specified grade has performed). The manual should 
include explicit directions for administration and suggestions for 
interpreting and using the results. Make sure that the time 
requirements are reasonable in terms of your school. 

It seems safe to state that no students can avoid standardized 
tests as they progress through school. Therefore, it is wise to teach 
students how to take such tests: just reading the standardized test 
directions as they begin the first test is not enough. Develop tests 
that use the same types of items that will be met on standardized 
tests. This is particularly necessary for young children; many 
rarely see a multiple-choice item, for instance, until it is met on a 
standardized test. 

8 m Diagnoeiia ■beets 

Some standardized tests are planned specifically to be 
diagnostic. They usually cover a limited scope in much greater detail 
than a test of general achievement. They are arranged to give scores 
on the separate parts. 
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«tssr= ^.rssa - - 

££££ °" ° f V l 8l0n " St Pr ° V±deS ^ U «* th "«le guidance of how to 
improve your instruction; knowing that the student attained an 

Netn:™ W to thaf 3 = V" - y ° U But^ntwing that 

nfrh, - I ! example was 16 remainder 3 tells you that 

th ^ quotient'that'o 1 V Understandi "S the placement of the answer in 
tne quotient, that perhaps she needs help with place value that 
Perhaps she does not understand the algorithm. P It prides lo* with 
some information to follow up on, P you wltn 

In developing a diagnostic test, select the examples with care- 

Irl^T S 6 6XampleS Whlch ren(H1 y allow «"<>™ of the types you 
predict. Have students show all of their work — u 7 

multiple-choice or other types of items' Whan V ° U 

III. Developing tests 

learninf eCt ihL^ aS8r00m ^ A ° m0V6 than asse88 «"dent 

8 ?/ ev ' l0I "" snt o£ aU W» ° £ Instruments. Then some 
S^pSr " C ° Mlder ln deVel ° Plng verious types o? IS. 

A. Planning the test 

A well-planned test must be designed to accomplish the ouroose it 

1, List the objeativea to be assessed by the test, 

ia«J 0miimie li W ? at h3Ve you tau8ht? What: mathematical content and 

obi a ct?ve 8 "hJ L lfflP ° rtant fM the 8tudents to have leaned? Test 
objectives should correspond to instructional objectives- 

telt ?te n °tf ob J""*« -»«««t the type of evaluatiln procedure and 

Ln!p"er4nd"L e ; c il f 0be J 8006 ob J ecti ^ •« bes? measured by 

non paper^ana-peneil procedures. 

type S e test eet |or 8 / iU , Vary ±n SC ° Pe and number depending on the 
IIZLT S u* 8 oaster y tes ti it may be that each objective 

toward which you taught will be assessed by several question" For 
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an achievement test at the end of a longer period of time, you must be 
more selective in choosing only the major critical points, those which 
are important in the hierarchy or as a "base" for future learning. 

2, Decide on the types of items to be constructed. 

The type of item depends on the nature of the objective to be 
measured. Once you have determined that an objective can be measured 
adequately by a paper-and-pencll item, you need to decide what type of 
item to use. Some mathematical objectives are measured well by 
short-answer or completion items, or by multiple-choice items; a few 
objectives are best measured by true-false or matching items. Such 
objective -type items (so-called because they can be scored 
objectively, with independent scorers obtaining the same results) 
measure knowledge and comprehension levels efficiently. A relatively 
large field of content can be sampled, for objective-type items can be 
answered quickly and one test can contain many questions. This broad 
coverage helps provide a reliable instrument. For higher-level 
outcomes, consider essay tests (yes, even for mathematics!), 

3, Decide on the number' of items to be mitten fov each 
objective , 

There are no simple rules for determining the "right" number of 
items to use for measuring each objective. The content of a test 
should reflect the relative amount of emphasis each objective has 
received in the actual instruction; thus, the number of items will be 
in proportion to the amount of emphasis. The level of the items will 
be similarly related to the objectives. Take into consideration 
whether the interpretation of results will be In terms of each 
separate objective or the test as a whole. And of course consider the 
amount of time available for administration of the test. 

To help ensure that the completed test will give each objective 
the desired coverage, develop an outline of specifications to serve as 
a guide for item construction, 



content % of emphasis number level of items 
(objectives) in instruction of items K V -Mppey 

•Mim W ^ g A j j 
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Tests should measure an adequate sample of the learning . .„.. 

concent Included in the instruction."^ can ua^er asl SlT h,"" 

IZrtZ outcomes.' " "»» «" 8 «W «- -« 

B * Writi ng the test Items; some gaii«rA l_ suggestions 

attained IS? M •J? 18 t0 asce « aln whether a student has 

S^J- ob J ective or not- There should be nothing about the 
structure or presentation of an item that leads those who know the 
correct an8wer to get the Item wrong or those who do not . know the 
answer to get the item right. 

14 snelif 1^ \T "S* Snent techni< I ue th « ^ most effective for the 
specltic obj active* 

2 * !nL^"A Si ^ 16 statement8 ' Use language that students 
understand. Choose concise vocabulary, and sentence 
construction that is appropriate to the level of your students 
Break a complex sentence into two or more separate sentences! 

3. Design each item so that it provides evidence that an oblective 
ha 8 been achieved. Avoid tasting for unimportant details, 
unrelated bits of information, or Irrelevant material. 

4. Check items against the table of specifications to make sure 
that you haye the desired emphasis on various content objectives 
at various levels of difficulty. J ves 

5. Work with another teacher or group of teachers in reviewing each 
unclear wording.^ ^ ^ °* *~ce - ™ 

7 " i 1 ^* 1 ^' H m&y want t0 write « ore "ems than you will need 
items. Many teachers write down items each day for possible 
not 2 fitted. tS8t ' t0 ^ eMUre that l0P ° rtant P° lntS * iU 

8. Have each student work from a separate copy of the test, rather 
than from a test written on the chalkboard. 

9 * to"thI last" 6 ™ 8 Consecutlvel y fro » ^e first item on the test 

10 ' rlst'onl^florS tt " 0n the b0tt ° m ° f one ^ and the 

rest on the top of the next page* 
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11. If the form of a test or a group of items is unfamiliar* use 
sample items to help clarify the directions* Spend some time 
teaching students how t© take a test. 

12. Precede each group of items with a simple, clear statement 
telling how and where the students are to indicate their 
answers. 

13. When you want students to show their work, provide adequate 
space near each item. "Boxing in" this space helps you to 
locate it quickly, 

14. Begin a test with easy items. Placing difficult items at the 
beginning of a test is likely to discourage average and below- 
average achievers. You can then arrange items so that the test 
gets increasingly more difficult, or you can mix easy and 
difficult Items. 

15. Many times you'll need to have more than one type of item on a 
test (short-answer, multiple-choice, etc.). Place all items of 
one kind together. Always have more than one or two items of a 
particular type (except possibly of the essay type). 

16. Avoid a regular sequence in the pattern of responses % students 
are likely to answer correctly without considering the content 
the item at all. 

17. Eliminate irrelevant clues and unnecessary or non-functional 
clues, but provide a reasonable basis for responding. 

18. Make directions to the student clear, concise, and complete. 
Instructions should be so clear that students know what they 
are expected to do, although they may be unable to do it. 

19. Prepare a key containing all the answers that are to be given 
credit. Make it so that it can be placed beside the answer 
spaces used by the students, 

20. After the test, go over questions with your students: they can 
point out ambiguities and other errors, helping you to improve 
items for future use* 

21. Analyze student responses to each item, for diagnostic use. 

C* Short-answer questions or completion items 

The short-answer item employs a question, an incomplete 
statement, or a computational example to elicit from the student 
appropriate words, symbols, or numbers* It is generally limited to 
questions that call for facts! who, what, when, where, how many, 
Many classroom mathematics tests are solely of this types it is 
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frequently used to measure the ability to compute. You can present a 
number of computational exercises, or you can focus the student's 
attention on particular aspects of a computation. 

In the completion item, certain important words or phrases are 
replaced by blanks to be filled in by the student. It must be very 
carefully prepared, or it is likely to measure only rote memory, or 
intelligence rather than achievement. 

State the item so that only a single brief answer is required and 
possible. 



1, 



2. Use a direct question when possible; switch to an incomplete 
statement only when greater conciseness is possible. 

3. Words to be supplied should relate to the main point of the 
statement. " 

4. Blanks should be placed at the end of the completion statement. 

5. Avoid giving extraneous clues to the answer. 

6. If the answer can appear in more than one form, give specific* 
directions about which form to use. Indicate such things as 
the degree of precision for numerical answers and whether labels 
must be used. 

7. Avoid the use of sentences taken directly from the textbook. 
They are frequently ambiguous out of context, and encourage 
rote memorization. 

8. Do not give clues to the answer by varying the number or length 
of the blanks, 

D. Multiple-choice items 

The multiple-choice item consists of a stem which is a question 
or an incomplete sentence presenting a problem situation, followed by 
several alternatives, which are possible solutions to the problem. 
One of the alternatives is the correct answer; the others are 
plausible answers, called distracters because their function is to 
distract students who are uncertain of the correct answer. The stem 
may also be a problem, graph, or diagram followed by the alternatives 
relating to it. 

The ease of scoring undoubtedly plays a big part in the 
popularity of multiple-choice items. Student answers are easy to read 
and unambiguous, The use of computer-scoring has made the 
multiple-choice item virtually the only type used when a computer is 
available or for standardized tests. In general, scores on 
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multiple-choice tests d re comparable to those that would be obtained 
from free-response tests, for the game level of content. 

But there are other reasons for deciding to use multiple-choice 
items.- they tend to provide a more adequate measure of many 
objectives than do other objective-type items. Multiple-choice tests 
have high reliability compared with other types of tests. And with 
careful analysis and development, the multiple-choice item can be 
adapted to most types of content and to most levels of objectives. It 
can assess the student's ability to recognize facts or relationships, 
to discriminate, to interpret, to analyze, to make inferences, to 
solve problems. Its biggest weakness is that it allows the student to 
guess, but this affects scores less than on other types of items. 

Multiple-choice items should not be used when a simple question 
is adequate, that is, where there is clearly only one correct answer 
and no plausible dlstracters. They should not be used when there are 
only two plausible responses; a true-false item is usually effective 
in that instance. 

1. Make directions explicit, so that students know exactly what type 
of response is required. Is more than one answer possible? Are 
they to select "the correct answer" or "the best answer"? How 
should they record answers? Shoud they guess if they aren't sure 
of the correct answer? 

2. The stem should present a single worthwhile problem to be solved, 
expressed clearly and without ambiguity. State the question so 
there can be only one interpretation. Check on the clarity of 
the stem by covering the alternatives and determining whether the 
question could be answered without the choices. 

3. Make each question independent of other questions. Students are 
often able to select the correct answer to one item because of 
information gleaned from another item. Where an answer to one 
item is used in succeeding items, students who miss that item 
will miss the succeeding items. 

4. Make alternative choices as brief as possible. Instead of 
repeating words in each alternative, include them In the stem, 

5. State the stem In positive form whenever possible. When negative 
wording is used, emphasize it by underlining or by capitalizing. 

6. The best alternative choices to the correct answer are those 
using commonly mistaken ideas or common misconceptions or errors 
commonly made by the students. Excellent dlstracters can be 
obtained from Incorrect responses on short-answer, completion, or 
essay tests, r * 
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In general, use the same number of alternatives for each item 
on a test, But remember that an item is not improved by 
adding an obviously wrong answer merely to obtain another 
alternative. Generally four or five alternatives are used, 
to reduce the chance of guesting the correct answer. It is 
better to have only four alternatives when five plausible 
choices are not available. 

Make all incorrect responses equally plausible or "attractive" 
to the student who does not know the correct answer. If 
Plausible distracters are difficult to find, use another type 
of item rather than ineffective alternatives. The more 
homogeneous the alternatives, the more difficult the item will 
be. The correct answer is one which cannot be refuted. 

Make all alternatives grammatically consistent with the stem, 
and parallel in form. Avoid verbal clues which might enable 
students to select the correct answer or to eliminate an 
incorrect alternative; similarity of wording in the stem and 
the correct answer, for instance, or including two responses 
that are all-inclusive or two that have the same meaning. 
Check the structure by reading each alternative with the stem. 

Do not consistently make the correct response longer or shorter 
than the distracters, There is a tendency to Include the 
greatest amount of detail in the correct answer. 

Avoid the use of qualifying words such as "always", "never", or 
"all" as much as possible; they are clues to a test-wise student 
that an alternative probably is not true. 

Avoid use of the alternative "all of the above" and use "none 
of the abcve" with care. The inclusion of "all of the above" 
makes it possible to answer the item on the basis of partial 
information; the student can realize that it is the correct 
choice by noting that two of the alternatives are correct , or 
that it is not the correct choice by noting that at least one 
of the alternatives is incorrect. The chance of guessing the 
correct answer is thus increased. The use of "none of the above" 
may be measuring only the ability to detect Incorrect answers; 
a student may do this and still not know the correct answer. 
If you want to reduce the chances of students estimating the 
answer without doing s.n entire computation (when that is the 
objective), use a completion-type item. 

Avoid using a pattern for the position of the correct response. 
Students are quick to perceive patterns or apparent patterns 
and select their answers accordingly. Use some system of random 
order for the positions of the correct answers on each multiple- 
choice test — and check to make sure that patterns did not 
inadvertently occur, Many teachers fail to use a, d, and e as 
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often as they use b and c as distracters. Students learn that 
their chances of guessing the correct answer are better if they 
guess b or e. Be sure the correct response is placed in all 
positions approximately the same number (but not exactly the same 
number) of times, 
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Control the difficulty of the item either by varying the problem 
in the stem or by changing the alternatives, 

15, Use an efficient item format, 

a, List alternatives on separate lines, one under the 
other, making them easy to read and compare, 

b. Use letters in front of alternatives, to avoid 
confusion with numerical answers. For algebra tests, 
you might use numerals in parentheses, 

E» True-false items 

The true-false item can be difficult to construct, for statements 
must be unquestionably true or false. To construct such items to 
measure important outcomes is difficult i they adapt best to the 
measurement of specific facts, understanding of principles or 
generalizations, and common misconceptions. They ean be used only 
when there are only two possible alternatives, P—nuse they are 
highly subject to guessing, true-false items have H tie value as 
diagnostic tools, 

"Alternative-response items" are variations in which the student 
must respond "agree" or "disagree"! "right", "partly right", or 

wrong • ©r with similar words. Other variations include items in 
which attention is directed to an underlined word or phrasej after 
deciding that any statement is false, the true words are to be 
inserted in place of the underlined words. Students can also be asked 
to state why the statement is true or false. Cluster true-false items 
deal with a single idea; such mathematical content as graphing can be 
tested with such an item, where students are asked to look at a graph 
and then respond to a series of true-false items about the data 
portrayed* 



1. 



Have students circle T and F, or write T and Fort and 0 (rather 
than t and _f or + and -, which cannot be distinguished aT 
readily). 



2, State the item clearly and specifically so that it is 

unequivocably true or false. Avoid the use, however, of specif i 
qualifiers such as "always" or "never" — or use them in both 
true and false statements. Check for ambiguities. 
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3. The Item should deal with a single definite idea. The use of 
several ideas In each statement tends to be confusing and the 
item is more likely to measure reading ability than achievement. 
There should be no more than one problem-setting clause. 

4, Avoid making true statements longer than false statements, 

5, Make the crucial alement readily apparent to the student. It 
is better to have the crucial element come at the end rather 
than in the early part of a two-part statement. 

6. Have an approximately equal (but not exactly equal) number of 
true and false statements (vary the proportions from test to 
test). 



7. Randomly arrange true and false items; check to be sure there 
is no inadvertent pattern. 

8. 



Avoid trick statements which appear to be true but are really 
false because of some inconspicuous or trivial word or phrase. 

9, Avoid statements that are partly true and partly false. 

10. Avoid the use of statements extracted from textbooks. Out of 
context, such statements are often unclear or ambiguous. 

F. Matching items 

The matching item measures ability to discriminate between 
several items of similar material as they are related in a given way 
with items of another set. The matching exercise is essentially a 
modification of the multiple-choice form. When all of the responses 
in a series of multiple-choice items are the same, the matching format 
is more appropriate. Said another way, unless all of the responses in 
a matching item serve as plausible alternatives for each premise, the 
matching format is Inappropriate. 

Matching items can be used for such content as definitions and 
words defined, measurement and formulas, or geometric shapes and 
names. They are most appropriate for testing at the knowledge level; 
it is difficult to adapt them to testing for comprehension and 
higher-level goals. 

1. Place the premise column on the left, the briefer responses on 
the right. Each of the items In the left column should have a 
test Item number; the responses should be preceded by letters, 
Have students place answers to each item In a space to the left 
of the item number. 

2. The Items in the two columns must be homogeneous (that Is, no 
responses should be logically excludable as answers by a student 
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who is uninformed). If they are not homogeneous, students may 
be provided with clues which will help them to match the terms, 
resulting in easier test items. Selection of the correct match 
should be dependent on knowledge of the correct answer, not on 
ability to eliminate incorrect answers on the basis of extraneous 
information. 

3* To reduce the effect of guessing, one column should contain more 
terms than the other. Directions should clearly indicate whether 
responses may be used once, more than once, or not at all* 

4. Do not Include too many items in either column: a maximum of 
twelve items in the premise column should be considered* Longer 
lists require too much searching time. 

5. Place the items in the response column in some logical order, to 
enable students to scan the list quickly to find the term they 
had in mind* Jumbling the terms merely increases searching time, 
without increasing the probability of correct answers being 
located. 

6. Be sure that there is only one response which is the correct 
match for each premise when responses are to be used only once. 

G* Essay items 

Essay items are not often used on mathematics tests, but they can 
and should be* Such items require students to do more than compute a 
solution or recall specific facts. They must think about mathematics 
and meaning* They must organize their own ideas and express 
themselves effectively In their own words, using both knowledge, and 
reasoning* Purely factual information is not assessed as efficiently 
as with objective-type items, but higher levels of reasoning can be 
tapped. Essay questions can be used to assess comprehension, 
applications , and analysis outcomes i they provide a means of assessing 
a student f s ability to synthesize or to evaluate mathematical ideas 
whieh is rarely provided by objective items* Essay questions that 
assess complex achievement are apt to include such key words as why* 
explain, compare, relate, interpret, criticize, develop, derive, 
classify, illustrate, and apply. Clearly, they assess higher-order 
thinking skills. 

There are difficulties in using essay items, as you 1 re aware. An 
essay test covers a limited field i the questions take so long to 
answer that relatively few can be answered in a given period of time. 
A representative sampling of content is not feasible. Essay items are 
subjective, more difficult to score, and less reliable than 
objective- type items. Scores are apt to be distorted by writing 
ability and by bluffing* Students who are fluent can often avoid 
discussing points of which they are unsure. But there are things you 
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can do to minimize these problems, beginning with the writing of 
clearly defined items - general enough to offer soma leeway, but 
specific enough to set limits. 

1. Use essay questions to evaluate achievement on those objectives 
not readily tested by other types of items, 

2. Phrase the questions as precisely as possible and be specific 
in wording, so the objective of the item is clear and students 
are made aware of the specific scope or limits to be Included 
in the answer. 

3. Make clear to students the basis on which the answer will be 
judged, such as content, organization, comprehensiveness, 
relevance, appropriateness, etc. 

4. Require ail students to answer all questions, so they are all 
taking the same test. One way of doing this is by setting time 
limits for each item. Be sure that students have time to write 
adequate answers t time must be allowed for thinking as well as 
tor writing. Provide adequate space for answers (or have 
students write on separate paper). 

5. Discuss ways of answering essay questions with the students. 

Since scoring essay items can be difficult, here are some 
suggestions which will increase objectivity, 

List specific objectives for each essay question as you write it. 
Evaluate in terms of the objectives, Separate scores may be 
given for style of writing or spelling, but should not 

contaminate the evaluation of the mathematical objective being 
assessed. 6 

Write out the essentials of a complete answer to each question or 
prepare a modej. answer ahead of time, Use it in the same way in 
scoring each paper. This does not preclude adding other 
acceptable points made by students. Determine the number of 
points to be assigned to each part of the model answer, or 
determine criteria for levels of expected quality. 

Keep the identity of students unknown. Have students use a 
coded numeral on the papers or have then write their names on 
the back or at the end of the test. 

Read one question through the entire set of papers, scoring 
each item for all papers before going on to the next item. 

More uniform standards can be applied by reading the answers 
twice. At the first reading, sort the papers into several piles, 
inen reread to check on the uniformity of answers In each pile 
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and make any necessary changes In rating. Assign the same item 
score to all papers in a pile. 

6* Reshuffle the papers so that a paper may not be scored unduly 
high or low because of its position! after scoring each item* 

H» Some related points 

2. Item pools 

An item pool is simply a collection of test items that you can 
put together in various combinations to form a test. Several items 
may be developed for testing each specific obj active | you can select 
the one that best meets test requirements- You'll probably find that 
a card file is the easiest way of filing the items. Write each item 
on a card, noting the topic or objective in one corner. At the bottom 
or on the back, record what you've learned about the item* When it 
should be used, what percentage of students get it correct each time 
you use it, and so on. 

Other sources of models for items include commercial tests, 
textbooks for students or teachers, collections of items or item 
banks, and the tests which were constructed for various research 
studies. 

Item sampling is a technique for assessing the status of a group 
of students. The National Assessment of Educational Progress uses 
this procedure, in order to avoid having students take a lengthy test. 
Instead of having all students at age nine answer all items, many 
similar samples of students are selected and each answers varied 
portions of the items. Then the scores are combined to depict how 
well nine-year-olds* as a group, answered the questions * Since your 
focus is usually on how well students are achieving, rather than on 
how well content is being achieved across students, you will probably 
not use item sampling techniques. You may find the term appearing 
frequently in various articles about testing, however* 

2. Item anaty&ie 

Item analysis is the process of studying the students 1 responses 
to each item. An item analysis can tell you how difficult an item is 
and how well each question discriminates between high^ and low-ranking 
students, It's especially Important if you are going to refuse the 
item: it can Indicate whether or not an item needs to be revised* 
It's also useful even if you don't plan to use the item again, for it 
can tell you what questions are especially appropriate to test certain 
objectives* Or it can be used simply as part of your diagnostic 
procedures. 

Computer programs are used for item analysis for tests that are 
developed for research studies, for standardized tests, and for other 
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ItltLlll * < U86d ^ ° any gr ° UpS ° f 8tu ^nts. Perhaps you have 
available a microcomputer program that can perform an item analysis 
Unless you prefer to use it, however, only simple ItL analysis 7 

^gg e 1tL e n 8 r eB Wa " ant6d n ° 8t elaS8r0 ° m Here are several 



(I) Look at the test! what items were missed by many student 

"faul^ e L m th 8 1 baCaU8 f °f 8 " £ault " ln the lte » « h 
rault in the instruction?, " 

revise the instruction. 



s? 



What do you do next? Revise the item or 



wh« SVjt S S PlS neasure of difficulty is the percentage of students 

Sfffcult the i> C °r eCt i ™ iS glVeS y ° U an ^Proximation of how 

J" 1 f" lt tneltem is. By recording this information for each item in 

SSi^E p ? 0l, , y °V n bulld • t«st which will be at an appropriate 
test in Z±lt U C8PGClally hel P ful when you're developing a 

medL™ dSJ? W3nt C ° student8 » each item should than be of 

medium difficulty ~ approximately 40% to 60%. (For mastery tests 
your standards will be different.) mascery tests. 

You can check the students' papers yourself to obtain the 

out C th e a ? e ^ ° r P U d ° " " en anal y s is by a show of hands. Call 
out the item numbers one by one and have students who have the item 

Savr^J 01 ? UP th6lr handi - C0Unt and record numbed hands? 
Have students convert it to a percent, or do this yourself. 

You can extend this activity by building a graph with the 
students, recording either the number of students who got the item 
correct or the number of incorrect responses. (For muftiple-choice 
tiere'an"^ a , reCOr ? "S* aelee "„g each alternative?) Are 

th 6 « f 7 pa " e T , in the S**PM W»t items were missed most? Are 
there areas Involving any particular objective? 



pupil* 



40 

10 

s 
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(3) To do a more sophisticated item analysis i use this procedure: 

(a) Arrange the test papers in order from highest to lowest score* 

(b) Select the highest one-third and the lowest one-third 
(approximately), setting aside the middle one-third of the 
papers, 

(c) For each item, count the number of students in the upper group 
who got it right and the number in the lower group who got it 
right* Let's say you have 10 papers in the upper one-third and 
10 in the lower one-third, For one item, here f s the count for 
the correct answer: upper* — » ? 

lower — 3 

(d) Convert these numbers to percentages i 

for all eiudgnt&; ? + 3 m gQ% 

20 

for upper^rartking eiudgnt&: m fQ% 

10 

for Im&r^rarikiriq students: 3 m gQ% 

10 

Most items on a test used to rank students should be of medium 
difficulty, so this item appears to be at a satisfactory level of 50%. 
The harder the item, the lower the percentage of students getting it 
correct. Moreover* if the item is a good one for ranking students, 
then substantially more students in the upper group will have answered 
it correctly — as happened in this case* Items on which many more 
students in the lower group got the item correct need revision. Thus, 
if the percentages above had been reversed, with 701 of the 
lower-ranking students getting it comet while only 30% of the 
upper-ranking students got it correct, there is something wrong with 
the item and it should be revised — or discarded* 

(4) On multiple-choice tests, determine the effectiveness of 
dlstracters by comparing the number of students in upper and lower 
groups who selected each incorrect alternative* A good dlstracter 
will attract more students from the lower group than from the upper 
group* Each dlstracter should attract some students or it is not 
serving effectively as a dlstracter. (Different criteria, however* 
apply to mastery tests*) 

8m Two dmfinitiane 

Any test, whether constructed by an individual teacher or 
commercially published, should meet several criteria, including 
acceptable validity and reliability. Validity pertains to the 
relevance of the test. Are you collecting the right kinds of 
information? Does the test measure the skills, understanding, or 
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knowledge that it was intended to test? Does it measure the 

!hTiI 1C f« be J a yf°" , that J fc «** intended to test? Does it measure 
the significant behaviors that are sp eified in the objectives? Are 
all items relevant to those behaviors? Is the test a balanced 
sampling of the behaviors? Reliability pertains to the consistency of 
the test. How accurate and stable is the test? Does it measure the 
same achievement consistently? The nature of mathematics helps to 
make mathematics tests quite reliable. If a test were perfectly 
reliable, the students would have the same score or be ranked in the 
same order if the test were repeated, or a parallel form of the same 
test were administered. Reliability is commonly reported by a 
coefficient or correlation between forms of the test or between two 
halves of the same test. Perfect reliability is represented by a 
coefficient of 1.00. Usually a coefficient of at least .80 is 

™i? C m?«°? an * bJ If iVe raathematl « test, many mathematics tests have 
reliabilities of .90 and higher. Tests of computational ability are 
usually more reliable than tests of problem-solving ability. 

You probably have many other queationa. Anawera to these 
queetvona, whether about definitione or about testing op about other 
aapeate of evaluation, may be answered by one or nore of the 
references included at the end of thia booklet. Theae referenees ore 
grouped by major theme, to aid you in locating pertinent information, 

IV. Concluding comment 

The goal of evaluation is improving instruction. Measuring or 
assessing or testing only indicates t the teacher then has to do 
something as a result of what he or she has learned. This booklet has 
not attempted to consider the most difficult task in teaching: the 
uje of the knowledge and understanding gained from evaluation. 
Evaluation is only a beginning ... you must continue the process of 
teaching * 
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Selected References on Evaluation 

These references, selected for their potential interest to teachers. 
ava oategor%ned under the following themes: 

1* Attitudes 

2, Competency Testing 

3* Criterion-Referenced Tests 

4* Diagnostic Testing 

5* General 

6* Interviews 

?* Item Fools 

B 9 Materials Evaluation 

9* Other Evaluation Techniques 

10* ProblenHSolving Tests 

11* Program Evaluation 

12* Special Populations 

13 9 StandardiBed Tests 

14* Teacher^Made Tests 

Attitudes 



Fennell, Francis, Affective Assessment Strategies In a Diagn ostic 
Prescript ive (Clinical) Mathematics Setting , April 1979, ERX 
ED 191 684* 



Fenneraa. E« and Sherman, J. Fennama-Sherman Mathematics Attitudes 
Scales, JSAS Catalog of Selected Docum ents in Psychology 6 : 
31S 1976, (Ms, No. 1225) ~ — * ^ 

Hodges, Helens L. B, Learning Styles? Rx for Mathophobia, 
Arithmetic Teacher 30t 17-20; March 1983. 

Michaels, Linda A, and Forsyth, Robert A. Measuring Attitudes Toward 
Mathematics? Some Questions to Consider. Arithmetic Teacher 
26: 22-25; December 1978, " " " 
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Competency Testing 

Carpenter, Thomas; Coburn, Terrence G. ; Reys, Robert E. j and Wilson, 
^ araes W * Results fro m the First Mathematics Assess ment of the 
National Assessment of Educa tional Progress , R WW tnn, v^.fnt,,- 
National Council of Teachers of Mathematics , 1978. 

Carpenter, Thomas P.; Corbitt, Mary Kay; Kepner, Henry S, , Jr 

Linquist, Mary Montgomery; and Reys, Robert E, Results from the 

Second Mathematics Asses sment of the National Assessm ent of ~~ 

Educational Progress. Reston, Virginia; National Council of 
Teachers of Mathematics, 1981. 

Carter, Betsy ¥. and Leinwand, Steven J, Calculators and 

Connecticut's Eighth-Grade Mastery Test. Arithmetic T eacher 
34: 55-56; February 1987. ~ ' " 

Crosswhite, F. Joe; Dossey, John A.j Swafford, Jane 0.; McKnlght, 
Curtis C.; and Cooney, Thomas J, Se cond Internatio nal 
Mathematics Stu dy Summary Report for the United"Stat I7, 
Champaign, Illinois: Stipes Publishing Company, 1985. 

Hagan Ronald D. Factors Influencing Arithmetic Performance on the 
Tennessee State-Mandated Eighth Grade Basic Skills Test. School 
Science an d Mathematics 82s 490-505; October 1982. ~ " 

Henderson, George L, and others. W isconsin Mathematics Test, Grades 

7and_8. Madison: Wisconsin State Department of Public "~ 

Instruction, 1978. ERIC: ED 051 185, ED 151 186, ED 069 475. 

Mappus, L, Lynne and others. Mathematics: Teaching and Testing Our 

Basic Skills Objectives -trades 1, 2, 3; Grades 4, 5.TT " 

Grades 7, 8. 1981. ERIC: ED 226 056, ED 226 057, ED 226 058. 

Matt, Warren. Implementing Mathematics Proficiency Testing. 
Mathematic s Teacher 73: 19-22; January, 1980. 

National Assessment of Educational Progress. The Third Nat ional 
l!!,"-! 8 A8Sesem6nt - Results , Trends and Issues . D R n^V ; 

Ortiz-Franco, Luis. Patterns of Mathematics Minimum Competen cy SkiUs 
_ln the ElementarFschonl. i.™ ai^-i^ ^ 1 f nni1 J gnnthucoL 
Regional Laboratory for Educational Research and Development, 
August, 1979. ERIC: ED 204 114, 

Smith, William D. Minimal Competencies: A Position Paper. 
Arithmetic Teacher 26: 25-26; November 1978. 

Suydam, Marilyn N. Assessing Achievement Across th e States: 

Mathematical Strengths a nd Weaknesses. Co].,,*.,,, "m^T ERIC 
Clearinghouse for Science, Mathematics, and Environmental 
Education, December 1984. ERIC: ED 255 356. 
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Criterion-Referenced Teats 

Besel, Ronald. Using Group Performance to Interpret Individual 
Responses to Criterion-Referenced Tests ." February 1973, 
ERIC: ED 076 658, 

Heines, Jesse M, An Examination of the Literature on Criterion- 
Referenced and Computer-Assisted Te stlngT November 1975. 
ERIC i ED 116 633, — ^""^ 

Knlpe, Walter H, and Krahmer, Edward P. An Application of Criterion 
Referenced Testing , February 1973, ERIC: ED 074 154, ~ 

Porter, Deborah Elena, Criterion Referenced Testing: A Bibliography , 
™ Re P ort 53, Princeton, New Jersey: ERIC Clearinghouse on 
Tests, Measurement, and Evaluation, December 1975. ERIC: 
ED 117 195. 

Roudabush, Glenn E, and Green, Donald Ross, Some Reliability Problems 
In a Criterion-Referenced T est, February 1971, ERIC: 
ED 050 144, " " 

Winkles, Jim, Criterion-Referenced Testing and Core Curriculum. 
Australian Mathematics Teacher 37: 8-11; August 1981, 
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Diagnostic Testing 

Algozzine, Bob and McGraw, Karen. Diagnostic Testing in Mathematics; 
An Extension of the FIAT ? Minneapolis: University of Minnesota 
Institute for Research on Learning Disabilities, March 1979 
ERIC: ED 185 749. 

Burns, Paul C. Analytical Testing and Follow-up Exercises in 

Elementary School Mathematics. School S cience and Mathematics 
65: 34-38; January 1965. ~~~ ~~ — — — 

Dunlap, William P. and Brennen, Alison H. Blueprint for the Diagnosis 
of Difficulties with Cardinality. Journal of Learning 
Disabilities 14s 12-14; January iWU 

Engelhardt, Jon M. and Wiebe, James H. Measuring Diagnostic/Remedial 

Competenc e in Teaching Elementary School Mathematics. 1978, 

ERIC: ED 177 018. " — — ~ " — 

Herman, Joan and Winters, Lynn. Test Design Manual: Guideline s for 
Developing Di agnostic Tests . Los Angeles: California University 
of Los Angeles, 1985. ERIC: ED 266 159. 

Liedtke, Werner. Learning Difficulties: Helping Young Children with 
Mathematics — Subtraction. Arithmetic Teacher 30: 21-23« 
December 1982. — ' 

McAloon, Ann. Using Questions to Diagnose and Remediate. Arithmetic 
Teacher 27: 44-48; November 1979. — 
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General 

Australian Council for Educational Research, Mathematics Eval uation 
Procedures K- 2, for Use by Teachers with Children in Year s K-4. 
Hawthorn, Australia; The Council, September 1980. IHcl — ~~~ 
ED 194 317. 

Bloom, Benjamin S. (Ed.) Taxonomy of Educational Objective s; The 
Classification of Educational CoalsT Handbook it Cognitive 
Domain. New York: David McKay, 1956. ' 

Bloom, Benjamin S.j Hastings, J. T. ; and Madaus, G. P. Handbook on 
Formative and Summative Evaluation of Student LearnHal New 
York: McGraw-Hill, 1971. — ^ 

Fielding, Glen D. and Shalock, Del H. Integrating Teaching a nd 
Testing: A H andbook for High School Teachers . MonmouthT" 
Oregon State System of Higher Education, January 1985. 
ERIC: ED 257 821. 

Krathwohl, David R, ; Bloom, Benjamin S, ; and Masia, Bertram B, 
Taxonomy of Ed ucational Objectives; The Classification of 
Educationa l Goals. Handbook II; Affecti ve Domain. — New York- 
David McKay, 1964. * — ~~" 

Mathematical Sciences Education Board. Information Releases. 

Washington, D.C.: The Board, National Academy of Sciences, 1986. 

National Assessment of Educational Progress. Math Objectives; 
1985-86 Assessment. Princeton, New Jersey: NAEP, 1985. 

National Council of Teachers of Mathematics. An Agenda for Action : 
Recommend ations for School Mathematics of the 1980s . Reston 
Virginia! The Council, 1980. " ' 

Norton, Mary Ann. Teaching; Improve Your Evaluation Techniques. 
Arithmetic Teacher . 30: 6-7; May 1983, 

Pikaart, Len and Travers, Kenneth J, Teaching Elementary School 
Mathematics: A Simplified Model. Arithmetic Teacher 20: 
332-342; May 1973. ^"^ ~ 

Stigglns, Richard J. and Brldgeford, Nancy J. The Ecology of 
Classroom Assessment, Journal of Educat ional Measurement 
22: 271-286; Winter 19W, — 

Swart, William L, Evaluation of Mathematics Instruction in the 
Elementary Classroom, Arithmetic Tea cher 21: 7-11: 
January 1974, 
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( General - continued) 

Virginia Council off Teachers of Mathematics, Mathematics Assessment 
for the Classroom Teacher , Chariot tesville, Virginia: The 
Council, 1983. 

Wirtz, Robert. The Tyranny of Tests in Elementary School Mathematiei 
Washington, D.C.: Curriculum Development Associates, April 
1979. ERIC : ED 176 950. 
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Interviews 



Callahan, ^erey G, Clinical Evaluation and the Cla ssroom Teacher. 
February 1973, ERIC? ED 076 640. ~ — ~ " — — ~" 

Far WeBt Laboratory for Educational Research and Development. 

Instruments for Indiv idual Assessment of Achievement . Beginning 
Teacher Evaluation Study Technical Note Series. ISn Francisco' 
The Laboratory, September 1975. ERIC? ED 170 304. 

Ginsburg, Herbert. The Clinical Interview in Psychological Research 
on Mathematical Thinking! Aims, Rationales, Techniques. For the 
Learning of Mathematics It 4-11; March 1981. 

Hart, Kath, Tell Me What You Are Doing. Mathematics Teaching 99- 
32-37; June 1982. ~ — - JL 

Lankford, Francis G, , Jr. What Can a Teacher Learn About a Pupil's 
Thinking Through Oral Interviews? Arithmetic Teacher 21 ■ 
26-32; January 1974. ~~ — 

Reys, Robert E. ; Rybolt, James P.; Bestgen, Barbara J.; and Wyatt 
* Wendell. Processes Used by Good Computational Estimators. 
Journal for Research in Mathematics Educ ation 13- 183-201* 
May 19821 "~ — ' — * 

Schoen, Harold L. Using the Individual Interview to Assess 

Mathematics Learning. Arithmetic Teacher 27s 34-37; November 

Weaver, J. Fred. Big Dividends from Little Interviews. Arithmeti c 
Teacher 2; 40-47; April 1955. "" — 

Williams, S, Irene and Jones, Chaneey 0, A Comparison of Interview 
and Normat ive Analysis of Mathematlcs*Q?estlons . Princeton. — 
New Jerseys Educational Testing Service, April 1972, ERIC: 
ED 067 397, 
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Item Pools 

Arter, J. and Estes, 0, D. Item Banking for Local Test D evelopment 
Practitioners' Handbook, Portland^ Oregon: Northwest Regional 
Educational Laboratory, 1985, ERIC: ED 266 166, 

Education Commission of the States, Math Resource Items for M inimal 
Competency Testing, Denver: The Commission, December 1977, — 
ERIC: ED 173 395, 

Fraser, Graham (Ed.), Mathematics Library of Test Items . Sydney 

ERIC- al ED*218 e 299° Uth Depattment of Education, July 1978. 

Kahn, Henry P. Needed: An Alternative for Mathematics Textbooks. 
School Sci ence and Mathematics 79: 476-478; October 1979, 

Lieberman, Marcus and others. Behavioral Objectives and Te st Items 
*°F < 1 ' Primary Mathematics. (2) Intermediate Mathemat ics. ' 
£ 3? Junior High Mathematics, and (4) High School Mathe matics. 

?S79 erS w ^ Ve » E - I1 H" oi8 ' InBti tute for Educational Research, 
1972. ERIC: ED 066 494, ED 066 495, ED 066 496, ED 066 497. 

Instructional Objectives Exchange. Objectives and Test It ems for 
Grades K-9 ? 10-12. Los Angeles: The Exchange, 1972, 1973,^ 

in J;, ? R ? C; ED 171 768 » m 171 77 °. ED 171 773, ED 171 785, 
ED 173 404, ED 173 406. 

National Assessment of Educational Progress. Selected Sup plemental 

Mathematics Exercises , National AssessmenFof Educati onal 

Progress, Denver: NAEP, October 1977. ERICI ED 183 388. 

National Assessment of Educational Progress. The Second Asse ssment 
of Mathematics. 1977-78, Relea sed Exercise Set. ito rtw ^iaw, 
May 1979. ERIC: ED 187 543, 

North Carolina. Metrics. The Measure of Your Future: Criterio n- 
Referenced Metrics Tests, Levels K-8, Ral^tpht ^n.. 
State Department of Public Instruction; Winston-Salem: Winston- 
Salem City Schools, May 1977. ERIC: ED 160 387, 

School Mathematics Study Group, Test Batteries, Descriptio n and 
Statistical Proper ties of Sca les — Kind ergartenTGrld TT^ 
Grade 2, Grade 3. ELMA Technical Reports 1-4, Stanford, 
California: SMSG, 1971, Available on loan from ERIC 
Clearinghouse for Science, Mathematics, and Environmental 
Education, 

Wilson, James V,; Gahen, Leonard S.; and Begle, Edward G. (Eds.). 

! e8 !.. B ! tterleB f ° r x - p °P ulat ion» Y-Popuiatlon, and Z-Populatlon , 
NLSMA Reports 1-3, Stanford, California: SMSG, 1968. Availabl e 
on loan from ERIC Clearinghouse for Science, Mathematics, and 
Environmental Education. 
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Materials Evaluation 

Heck, William P.; Johnson, Jerry; Kansky, Robert J.; and Dennis, 
Dick. Guidelines for Evaluating Computerized Instructional 
Materials , Reston, Virginia: National Council of Teachers of 
Mathematics, 1984* 



National Council of Teachers of Mathematics, How to Evaluate 

Mathematics Textbooks . Reston, Virginia: The Council, 1982. 
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Other Evaluation Techniques 

Ash, Michael J, and Sat tier, Howard E, A Video Tape Techn ique for 

Assessing Beha vioral Correlates of Academ ic Perform*,^. m„^h 

1973, ERIC; ED 074 74 7 „ — — — ~ ~~ ' "- - 

Cornett. J. Alternatives to Paper and Pencil Testing. NASSP 
Bulletin 66; 44-46; November 1982. — — 

Finstein, Phyllis. Color Their Arithmetic. Arithmet ic Teacher 
26; 20-22; April 1979. ' — 

Greenius, Eric A. Notebook Evaluation Made Easy! Mathematics 
Teacher 76; 106-107; February 1983. — __ 

Bammitt, Helen. Evaluating and Reteaching Slow Learners. Arithmetic 
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